Report for each of the weighted automata obtained ˆ the number of states; ˆ the number of ɛ-transitions;
|
|
- Jacob Fitzgerald
- 6 years ago
- Views:
Transcription
1 Mehryar Mohri Speech Recognition Courant Institute of Mathematical Sciences Homework assignment 3 (Solution) Part 2, 3 written by David Alvarez 1. For this question, it is recommended that you use the GRM library and FSM or OpenFst libraries. In fact, try as much as possible to use the utilities of these libraries to answer the questions. However, you need to justify your responses and not just mention the library utilities used. (a) Download the following training corpus S and test corpus Ŝ: (b) Extract the vocabulary Σ 1 of S and define a start and end symbol. (c) Create the following language models: ˆ bigram back-off model; ˆ trigram back-off model; These can be created as indicated in the lecture slides using the utilities grmmake and grmconvert. Report for each of the weighted automata obtained ˆ the number of states; ˆ the number of transitions; ˆ the number of ɛ-transitions; ˆ the number of n-grams found (n = 2 for bigram models, n = 3 for trigram models). For these questions, you can use the utility fsminfo of the FSM library. You should however explain how you determine the number of n-grams. The number of states, transitions, and ɛ-transitions are obtained directly using fsminfo. For the bigram models, by construction, the number of unigrams is the number of states minus one (back-off state), minus the initial and final state if one wishes to exclude the start and end symbols. By construction, the number of bigrams is the number of non-ɛ-transitions minus the number of non-ɛ-transitions leaving the back-off state since each non-ɛ-transition labeled with b leaving a nonbackoff state a precisely corresponds to the occurrence of bigram ab. 1
2 Now, there is exactly one non-ɛ-transition from the back-off state to all states except to the back-off state itself and the initial state. Thus, N(bigrams) = N(transitions) N(ɛ transitions) N(states) + 2. This can be also re-written as N(bigrams) = N(transitions) 2N(states) + 4. The number of trigrams can be obtained in a similar fashion from the trigram language model as N(trigrams) = N(transitions) 2N(states) + 4 N(bigrams). If you constructed the bigram model with: $ grmcount -n 2 -s 1 -f 2 train.far \ grmmake > train.2.lm.fsm you should have: # of states # of arcs # of eps # of bigrams Similarly for the trigram model constructed with: $ grmcount -n 3 -s 1 -f 2 train.far \ grmmake > train.3.lm.fsm you should have: # of states # of arcs # of eps # of trigrams (d) Randomly generate 100 sequences from the first model and compare the likelihood given by the two models to the sample formed by these sentences. It is not hard to generate random sentences from a model using fsmrandgen. While this may be subjective and depend on the sentences you have generated, in general the trigram models should be closer to the sentences that served for training and thus could appear closer to English than the bigram models. 2
3 (e) Compute the perplexity of these models using the test corpus. If the n-gram language model is represented as a weighted automaton A over the log semiring, then, by definition, the negative log of the probability of the text is obtained by computing the log -sum of the weights of all paths of A B, where B is a deterministic automaton representing the text. Thus composition followed by the application of a shortest-distance algorithm (over the log semiring) yields the result. This can be used to compute the perplexity of the model. The shortestdistance can be obtained using the utilities fsmpush or fsmpotentials. The automaton B can be a long linear chain. Instead, one can use a transducer B union of the sentences mapping each sentence to each rank in the text. A B followed by determinization gives the negative log probability of each sentence. The models created in (c) are defined over the tropical semiring, not the log semiring. Also, the negative log probability of each sentence is computed separately due to the presence of start and end symbols. A standard shortest-distance (over the tropical semiring) can be used to compute the negative log probability of each sentence from A X s, where X s is an automaton representing sentence s. The sum over all sentences s can be used to compute the perplexity. A (not very fast) way of computing the perplexity on the test set is: cat test.far \ farfilter "fsmcompose lm.fsm - fsmrmepsilon fsmdeterminize fsmpush -c -f" \ farprintstrings -c -i labels \ awk { tot_words += NF - 2; h += $NF / log(2.0) } END { print 2.0 ^ (h / tot_words) } For the bigram model the perplexity should be around 173, for the trigram it should be around 102. (f) Shrink both of these models with the option s4. What are the perplexity estimates for these models. The models can shrunken using grmshrink. The perplexities can be computed as in the previous question. They are expected to be higher, around 223 for the bigram model, and 175 for the trigram model. 2. Class-based model 3
4 (a) First, for information purposes, we obtain the ten most frequent bigrams, for which we can use the srilm library to output the bigrams and their counts, and then sort them with the following command ngram count text../train.txt order 2 write order... 2 write BigramCounts.txt sort k2 t $'\t' n... BigramCounts.txt This gives us the following most frequent bigrams and their counts (in descending order) cts vs 2675 for the 2676 mln vs 3206 said the 3579 said it 4062 mln dlrs 4497 in the 6175 of the 6779 said </s> 7782 <s> the Note that the Mutual Information can be approximated as I(w 1, w 2 ) log 2 N c(w 1 w 2 ) c(w 1 )P (w 2 ) where N = V is the corpus size. Using this formula, we can obtain the 20 elements with the highest PMI with the following simple python implementation, which uses the unigram and bigram counts obtained with the SRILM library. BiCounter=Counter() UniCounter = Counter() MutualInfo=Counter() with open('bigramcounts.txt','r') as f: reader=csv.reader(f,delimiter='\t') for row in reader: bigram = row[0] BiCounter[bigram]=float(row[1]) with open('unigramcounts.txt','r') as f: reader=csv.reader(f,delimiter='\t') for row in reader: UniCounter[row[0]]=float(row[1]) 4
5 totalbigrams = sum(bicounter.values()) totalunigrams = sum(unicounter.values()) for bigram in list(bicounter): unigrams = bigram.split() if BiCounter[bigram] > 0: if... (UniCounter[unigrams[0]]*UniCounter[unigrams[1]])>0: MutualInfo[bigram]=... math.log((totalunigrams*bicounter[bigram])/ \(UniCounter[unigrams[0]]*UniCounter[unigrams[1]]),2) else: MutualInfo[bigram] = float('inf') MostCommon = MutualInfo.most common(20) The results are shown in Table (1). Table 1: Mutual Information w 1 w 2 I(w 1, w 2 ) blankets soap rwevs harlan ullman parra gil andrzej doroscz mochtar kusumaatmadja heron cay unexplored moere fiberglass architecural brewers castlemaine rm nicel beau bolter robbie mupawose hissene habre zheng tuobin ramar intercapital culpable homicide It is no surprise that all of these bigrams have the same MI, since for a 5
6 pair of words that only appear once each and they appear as a bigram, then I(w 1, w 2 ) log 2 P (w 1 w 1 ) P (w 1 )P (w 2 ) = log V 1 = It is clear what the mutual information is measuring. Bigrams that frequently appear together have high values of MI, and thus could be considered as a block phrase in the language model. (b) Note that it doesn t make much sense to define classes for pairs of numbers (of which the corpus has many examples), since they naturally appear only once due to their individual low probability of occurring and thus manage to get high values of Mutual Information. Filtering these cases, we now pick the 2000 bigrams with highest MI in python, and then write a text file dictionary with them. With python, we can very easily create the classes and write the text files in the format required to define a transducer for the fsm library. We use MostCommon2000 = MutualInfoNoNum.most common(2000) nonmappedunigrams = set(unicounter) Dict=dict([jj[0],jj[1]] for jj in MostCommon2000) f= open("classes.txt","w") f2 = open("classescorpus.txt","w") label = #Starting label of symbols file for key in Dict.keys(): unigrams=key.split() klass = unigrams[0]+" "+unigrams[1] f.write("0 0 "+unigrams[0]+" "+klass+"\n") f.write("0 0 "+unigrams[1]+" "+klass+"\n") f2.write(klass+" "+str(label)+"\n") label+=1 nonmappedunigrams= nonmappedunigrams... set(unigrams[0]) set(unigrams[1]) f.close() f2.close() f=open("classes.txt","a") for unigram in nonmappedunigrams: f.write("0 0 "+unigram+" "+unigram+"\n") f.write("1") f.close() Then, using these files, we can create the transducer that maps into the classes with 6
7 fsmcompile iclassescorpus.syms oclassescorpus.syms... t<classes.txt> ClassMapper.fsm and then, to create the LM, we first read the training sentences, compose them with the Class Mapper transudcer and project them into the output labels. Thus, we have now a far file with the edited sentences. farcompilestrings iclassescorpus.syms train.txt> Corpus.far farfilter "fsmcompose ClassMapper.fsm fsmproject... 2"<Corpus.far > MappedCorpus.far Finally, we can build the language model with the classes as follows grmcount n2 s1 f2 MappedCorpus.far grmmake >... BiModelClasses.fsm The construction of the class-bases trigram model is analogous. (c) Finally, we evaluate the class-based model by computing its perplexity. In order to do this, we need to preprocess the test sentences, by composing them with the ClassMapper as before. Then, we compute the perplexity just as we did in part 5. The results for these class-based models are: Table 2: Performance for Class-Based Models Model Perplexity (w < /s >) Perplexity (w/o < /s >) Bigram Trigram As we can see, grouping bigrams with large mutual information into classes helped to significantly improve the perplexity in all cases. 3. Maxent Models After an intricate and complicated installation of the packages liblbfgs, srilm, and its extension for maxent models, we train a Maxent model with bigram features with the following code ngram count text../train.txt maxent lm... MaxEntBigram order 2 7
8 The output is the following Iteration 99 No of NaNs in logzs: 0, No infs: 0 dual is regularized dual is norm of gradient = norm of regularized gradient = No of NaNs in logzs: 0, No infs: 0 dual is regularized dual is norm of gradient = norm of regularized gradient = Iteration 100 OWL BFGS terminated with the stopping criterion Duration: 11 seconds From here we can see that the LGFBS optimization method has met a maximum iteration (100) criterion. Similarily, we build the maxent model for trigram features. ngram count text../train.txt maxent lm... MaxEntTrigram order 3 The process, which takes significantly more time to run, now actually reaches convergence, although close to the 100 iteration limit. Iteration 97 No of NaNs in logzs: 0, No infs: 0 dual is regularized dual is norm of gradient = norm of regularized gradient = Iteration 98 OWL BFGS resulted in convergence Duration: 68 seconds Now, we compute perplexities on the same test set as for part A. The code to implement this is ngram maxent lm MaxEntBigram ppl../test.txt... debug 2 8
9 ngram maxent lm MaxEntTrigram ppl../test.txt... debug 3 The result for the bigram model is the following sentences, words, 5257 OOVs 0 zeroprobs, logprob= ppl= ppl1= As we did in the previous section, ngram reports the perplexity in the sentences with stopping signs (ppl) and without them (ppl1). On the other hand, for the trigram model we have sentences, words, 5257 OOVs 0 zeroprobs, logprob= ppl= ppl1= For completion, we now try another possible model that the maxent patch allows to create. This time, we create an interpolated mixture of the maxent bigram and trigram models. ngram maxent lm MaxEntTrigram mix maxent mix lm... MaxEntBigram ppl../test.txt bayes 0 The results, however, show that this mixture model does not perform as well as the pure trigram model sentences, words, 5257 OOVs 0 zeroprobs, logprob= ppl= ppl1= Comparing to the results obtained in the first part, we see that the perplexities for both methods are very similar, with a slight but constant advantage for the ngram back-off models. In terms of efficiency, it seems like computing perplexities for the maxent model is faster than for the usual ngrams, but not faster than for the pruned ngram models. It must be noted also that the method LFBGS is known converge very fast, so the core of the maxent is already close to top-of-the-art in terms of efficiency. 9
Speech Recognition CSCI-GA Fall Homework Assignment #1: Automata Operations. Instructor: Eugene Weinstein. Due Date: October 17th
Speech Recognition CSCI-GA.3033-015 Fall 2013 Homework Assignment #1: Automata Operations Instructor: Eugene Weinstein Due Date: October 17th Note: It is advised, but not required, to use the OpenFST library
More informationLexicographic Semirings for Exact Automata Encoding of Sequence Models
Lexicographic Semirings for Exact Automata Encoding of Sequence Models Brian Roark, Richard Sproat, and Izhak Shafran {roark,rws,zak}@cslu.ogi.edu Abstract In this paper we introduce a novel use of the
More informationWeighted Finite-State Transducers in Computational Biology
Weighted Finite-State Transducers in Computational Biology Mehryar Mohri Courant Institute of Mathematical Sciences mohri@cims.nyu.edu Joint work with Corinna Cortes (Google Research). 1 This Tutorial
More informationWeighted Finite-State Transducers in Computational Biology
Weighted Finite-State Transducers in Computational Biology Mehryar Mohri Courant Institute of Mathematical Sciences mohri@cims.nyu.edu Joint work with Corinna Cortes (Google Research). This Tutorial Weighted
More informationYou will likely want to log into your assigned x-node from yesterday. Part 1: Shake-and-bake language generation
FSM Tutorial I assume that you re using bash as your shell; if not, then type bash before you start (you can use csh-derivatives if you want, but your mileage may vary). You will likely want to log into
More informationScalable Trigram Backoff Language Models
Scalable Trigram Backoff Language Models Kristie Seymore Ronald Rosenfeld May 1996 CMU-CS-96-139 School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 This material is based upon work
More informationPrivacy and Security in Online Social Networks Department of Computer Science and Engineering Indian Institute of Technology, Madras
Privacy and Security in Online Social Networks Department of Computer Science and Engineering Indian Institute of Technology, Madras Lecture - 25 Tutorial 5: Analyzing text using Python NLTK Hi everyone,
More informationExam Marco Kuhlmann. This exam consists of three parts:
TDDE09, 729A27 Natural Language Processing (2017) Exam 2017-03-13 Marco Kuhlmann This exam consists of three parts: 1. Part A consists of 5 items, each worth 3 points. These items test your understanding
More informationA General Weighted Grammar Library
A General Weighted Grammar Library Cyril Allauzen, Mehryar Mohri, and Brian Roark AT&T Labs Research, Shannon Laboratory 80 Park Avenue, Florham Park, NJ 0792-097 {allauzen, mohri, roark}@research.att.com
More informationSparse Non-negative Matrix Language Modeling
Sparse Non-negative Matrix Language Modeling Joris Pelemans Noam Shazeer Ciprian Chelba joris@pelemans.be noam@google.com ciprianchelba@google.com 1 Outline Motivation Sparse Non-negative Matrix Language
More informationA General Weighted Grammar Library
A General Weighted Grammar Library Cyril Allauzen, Mehryar Mohri 2, and Brian Roark 3 AT&T Labs Research 80 Park Avenue, Florham Park, NJ 07932-097 allauzen@research.att.com 2 Department of Computer Science
More informationLANGUAGE MODEL SIZE REDUCTION BY PRUNING AND CLUSTERING
LANGUAGE MODEL SIZE REDUCTION BY PRUNING AND CLUSTERING Joshua Goodman Speech Technology Group Microsoft Research Redmond, Washington 98052, USA joshuago@microsoft.com http://research.microsoft.com/~joshuago
More informationTuning. Philipp Koehn presented by Gaurav Kumar. 28 September 2017
Tuning Philipp Koehn presented by Gaurav Kumar 28 September 2017 The Story so Far: Generative Models 1 The definition of translation probability follows a mathematical derivation argmax e p(e f) = argmax
More informationSpeech Recognition Lecture 12: Lattice Algorithms. Cyril Allauzen Google, NYU Courant Institute Slide Credit: Mehryar Mohri
Speech Recognition Lecture 12: Lattice Algorithms. Cyril Allauzen Google, NYU Courant Institute allauzen@cs.nyu.edu Slide Credit: Mehryar Mohri This Lecture Speech recognition evaluation N-best strings
More informationOverview. Search and Decoding. HMM Speech Recognition. The Search Problem in ASR (1) Today s lecture. Steve Renals
Overview Search and Decoding Steve Renals Automatic Speech Recognition ASR Lecture 10 January - March 2012 Today s lecture Search in (large vocabulary) speech recognition Viterbi decoding Approximate search
More informationPart-of-Speech Tagging
Part-of-Speech Tagging A Canonical Finite-State Task 600.465 - Intro to NLP - J. Eisner 1 The Tagging Task Input: the lead paint is unsafe Output: the/ lead/n paint/n is/v unsafe/ Uses: text-to-speech
More informationHierarchical Phrase-Based Translation with WFSTs. Weighted Finite State Transducers
Hierarchical Phrase-Based Translation with Weighted Finite State Transducers Gonzalo Iglesias 1 Adrià de Gispert 2 Eduardo R. Banga 1 William Byrne 2 1 Department of Signal Processing and Communications
More informationA Comparison of Sequence-Trained Deep Neural Networks and Recurrent Neural Networks Optical Modeling For Handwriting Recognition
A Comparison of Sequence-Trained Deep Neural Networks and Recurrent Neural Networks Optical Modeling For Handwriting Recognition Théodore Bluche, Hermann Ney, Christopher Kermorvant SLSP 14, Grenoble October
More informationDiscriminative Training of Decoding Graphs for Large Vocabulary Continuous Speech Recognition
Discriminative Training of Decoding Graphs for Large Vocabulary Continuous Speech Recognition by Hong-Kwang Jeff Kuo, Brian Kingsbury (IBM Research) and Geoffry Zweig (Microsoft Research) ICASSP 2007 Presented
More informationLign/CSE 256, Programming Assignment 1: Language Models
Lign/CSE 256, Programming Assignment 1: Language Models 16 January 2008 due 1 Feb 2008 1 Preliminaries First, make sure you can access the course materials. 1 The components are: ˆ code1.zip: the Java
More informationHow SPICE Language Modeling Works
How SPICE Language Modeling Works Abstract Enhancement of the Language Model is a first step towards enhancing the performance of an Automatic Speech Recognition system. This report describes an integrated
More informationCS294-1 Assignment 2 Report
CS294-1 Assignment 2 Report Keling Chen and Huasha Zhao February 24, 2012 1 Introduction The goal of this homework is to predict a users numeric rating for a book from the text of the user s review. The
More informationStanford University Computer Science Department Solved CS347 Spring 2001 Mid-term.
Stanford University Computer Science Department Solved CS347 Spring 2001 Mid-term. Question 1: (4 points) Shown below is a portion of the positional index in the format term: doc1: position1,position2
More informationCS159 - Assignment 2b
CS159 - Assignment 2b Due: Tuesday, Sept. 23 at 2:45pm For the main part of this assignment we will be constructing a number of smoothed versions of a bigram language model and we will be evaluating its
More informationFinite-State Transducers in Language and Speech Processing
Finite-State Transducers in Language and Speech Processing Mehryar Mohri AT&T Labs-Research Finite-state machines have been used in various domains of natural language processing. We consider here the
More informationStone Soup Translation
Stone Soup Translation DJ Hovermale and Jeremy Morris and Andrew Watts December 3, 2005 1 Introduction 2 Overview of Stone Soup Translation 2.1 Finite State Automata The Stone Soup Translation model is
More informationLog- linear models. Natural Language Processing: Lecture Kairit Sirts
Log- linear models Natural Language Processing: Lecture 3 21.09.2017 Kairit Sirts The goal of today s lecture Introduce the log- linear/maximum entropy model Explain the model components: features, parameters,
More informationWeighted Finite State Transducers in Automatic Speech Recognition
Weighted Finite State Transducers in Automatic Speech Recognition ZRE lecture 10.04.2013 Mirko Hannemann Slides provided with permission, Daniel Povey some slides from T. Schultz, M. Mohri and M. Riley
More informationUMass CS690N: Advanced Natural Language Processing, Spring Assignment 1
UMass CS690N: Advanced Natural Language Processing, Spring 2017 Assignment 1 Due: Feb 10 Introduction: This homework includes a few mathematical derivations and an implementation of one simple language
More informationGender-dependent acoustic models fusion developed for automatic subtitling of Parliament meetings broadcasted by the Czech TV
Gender-dependent acoustic models fusion developed for automatic subtitling of Parliament meetings broadcasted by the Czech TV Jan Vaněk and Josef V. Psutka Department of Cybernetics, West Bohemia University,
More informationCS 124/LING 180/LING 280 From Languages to Information Week 2: Group Exercises on Language Modeling Winter 2018
CS 124/LING 180/LING 280 From Languages to Information Week 2: Group Exercises on Language Modeling Winter 2018 Dan Jurafsky Tuesday, January 23, 2018 1 Part 1: Group Exercise We are interested in building
More informationNatural Language Processing Basics. Yingyu Liang University of Wisconsin-Madison
Natural Language Processing Basics Yingyu Liang University of Wisconsin-Madison Natural language Processing (NLP) The processing of the human languages by computers One of the oldest AI tasks One of the
More informationThe Design Principles and Algorithms of a Weighted Grammar Library. and. and. 1. Introduction
International Journal of Foundations of Computer Science c World Scientific Publishing Company The Design Principles and Algorithms of a Weighted Grammar Library CYRIL ALLAUZEN allauzen@research.att.com
More information1. What is the VC dimension of the family of finite unions of closed intervals over the real line?
Mehryar Mohri Foundations of Machine Learning Courant Institute of Mathematical Sciences Homework assignment 2 March 06, 2011 Due: March 22, 2011 A. VC Dimension 1. What is the VC dimension of the family
More informationSmoothing. BM1: Advanced Natural Language Processing. University of Potsdam. Tatjana Scheffler
Smoothing BM1: Advanced Natural Language Processing University of Potsdam Tatjana Scheffler tatjana.scheffler@uni-potsdam.de November 1, 2016 Last Week Language model: P(Xt = wt X1 = w1,...,xt-1 = wt-1)
More informationProject and Production Management Prof. Arun Kanda Department of Mechanical Engineering Indian Institute of Technology, Delhi
Project and Production Management Prof. Arun Kanda Department of Mechanical Engineering Indian Institute of Technology, Delhi Lecture - 8 Consistency and Redundancy in Project networks In today s lecture
More informationNatural Language Processing
Natural Language Processing Language Models Language models are distributions over sentences N gram models are built from local conditional probabilities Language Modeling II Dan Klein UC Berkeley, The
More informationLecture 8: Speech Recognition Using Finite State Transducers
Lecture 8: Speech Recognition Using Finite State Transducers Lecturer: Mark Hasegawa-Johnson (jhasegaw@uiuc.edu) TA: Sarah Borys (sborys@uiuc.edu) Web Page: http://www.ifp.uiuc.edu/speech/courses/minicourse/
More informationSolutions to Homework 10
CS/Math 240: Intro to Discrete Math 5/3/20 Instructor: Dieter van Melkebeek Solutions to Homework 0 Problem There were five different languages in Problem 4 of Homework 9. The Language D 0 Recall that
More informationRLAT Rapid Language Adaptation Toolkit
RLAT Rapid Language Adaptation Toolkit Tim Schlippe May 15, 2012 RLAT Rapid Language Adaptation Toolkit - 2 RLAT Rapid Language Adaptation Toolkit RLAT Rapid Language Adaptation Toolkit - 3 Outline Introduction
More informationHomework #1: CMPT-825 Reading: fsmtools/fsm/ Anoop Sarkar
Homework #: CMPT-825 Reading: http://www.research.att.com/ fsmtools/fsm/ Anoop Sarkar anoop@cs.sfu.ca () Machine (Back) Transliteration Languages have different sound inventories. When translating from
More information6.867 Machine Learning
6.867 Machine Learning Problem set 3 Due Tuesday, October 22, in class What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.
More informationWFST: Weighted Finite State Transducer. September 12, 2017 Prof. Marie Meteer
+ WFST: Weighted Finite State Transducer September 12, 217 Prof. Marie Meteer + FSAs: A recurring structure in speech 2 Phonetic HMM Viterbi trellis Language model Pronunciation modeling + The language
More informationFinite-State and the Noisy Channel Intro to NLP - J. Eisner 1
Finite-State and the Noisy Channel 600.465 - Intro to NLP - J. Eisner 1 Word Segmentation x = theprophetsaidtothecity What does this say? And what other words are substrings? Could segment with parsing
More informationA MACHINE LEARNING FRAMEWORK FOR SPOKEN-DIALOG CLASSIFICATION. Patrick Haffner Park Avenue Florham Park, NJ 07932
Springer Handbook on Speech Processing and Speech Communication A MACHINE LEARNING FRAMEWORK FOR SPOKEN-DIALOG CLASSIFICATION Corinna Cortes Google Research 76 Ninth Avenue New York, NY corinna@google.com
More informationWeighted Finite State Transducers in Automatic Speech Recognition
Weighted Finite State Transducers in Automatic Speech Recognition ZRE lecture 15.04.2015 Mirko Hannemann Slides provided with permission, Daniel Povey some slides from T. Schultz, M. Mohri, M. Riley and
More informationPredicting Popular Xbox games based on Search Queries of Users
1 Predicting Popular Xbox games based on Search Queries of Users Chinmoy Mandayam and Saahil Shenoy I. INTRODUCTION This project is based on a completed Kaggle competition. Our goal is to predict which
More informationCSE 546 Machine Learning, Autumn 2013 Homework 2
CSE 546 Machine Learning, Autumn 2013 Homework 2 Due: Monday, October 28, beginning of class 1 Boosting [30 Points] We learned about boosting in lecture and the topic is covered in Murphy 16.4. On page
More informationKenLM: Faster and Smaller Language Model Queries
KenLM: Faster and Smaller Language Model Queries Kenneth heafield@cs.cmu.edu Carnegie Mellon July 30, 2011 kheafield.com/code/kenlm What KenLM Does Answer language model queries using less time and memory.
More informationHomework 2: HMM, Viterbi, CRF/Perceptron
Homework 2: HMM, Viterbi, CRF/Perceptron CS 585, UMass Amherst, Fall 2015 Version: Oct5 Overview Due Tuesday, Oct 13 at midnight. Get starter code from the course website s schedule page. You should submit
More informationCSC 2515 Introduction to Machine Learning Assignment 2
CSC 2515 Introduction to Machine Learning Assignment 2 Zhongtian Qiu(1002274530) Problem 1 See attached scan files for question 1. 2. Neural Network 2.1 Examine the statistics and plots of training error
More informationSpeech Recognition Lecture 8: Acoustic Models. Eugene Weinstein Google, NYU Courant Institute Slide Credit: Mehryar Mohri
Speech Recognition Lecture 8: Acoustic Models. Eugene Weinstein Google, NYU Courant Institute eugenew@cs.nyu.edu Slide Credit: Mehryar Mohri Speech Recognition Components Acoustic and pronunciation model:
More informationA Neuro Probabilistic Language Model Bengio et. al. 2003
A Neuro Probabilistic Language Model Bengio et. al. 2003 Class Discussion Notes Scribe: Olivia Winn February 1, 2016 Opening thoughts (or why this paper is interesting): Word embeddings currently have
More informationCSCI 599: Applications of Natural Language Processing Information Retrieval Retrieval Models (Part 3)"
CSCI 599: Applications of Natural Language Processing Information Retrieval Retrieval Models (Part 3)" All slides Addison Wesley, Donald Metzler, and Anton Leuski, 2008, 2012! Language Model" Unigram language
More informationTHE CUED NIST 2009 ARABIC-ENGLISH SMT SYSTEM
THE CUED NIST 2009 ARABIC-ENGLISH SMT SYSTEM Adrià de Gispert, Gonzalo Iglesias, Graeme Blackwood, Jamie Brunning, Bill Byrne NIST Open MT 2009 Evaluation Workshop Ottawa, The CUED SMT system Lattice-based
More informationMeasurements of the effect of linear interpolation values and reduced bigram model size for text prediction
Measurements of the effect of linear interpolation s and reduced bigram model size for text prediction Marit Ånestad Lunds Tekniska Högskola man039@post.uit.no Michael Geier Lunds Tekniska Högskola michael.geier@student.tugraz.at
More informationEND-TERM EXAMINATION
(Please Write your Exam Roll No. immediately) Exam. Roll No... END-TERM EXAMINATION Paper Code : MCA-205 DECEMBER 2006 Subject: Design and analysis of algorithm Time: 3 Hours Maximum Marks: 60 Note: Attempt
More informationSolution 1 (python) Performance: Enron Samples Rate Recall Precision Total Contribution
Summary Each of the ham/spam classifiers has been tested against random samples from pre- processed enron sets 1 through 6 obtained via: http://www.aueb.gr/users/ion/data/enron- spam/, or the entire set
More informationPing-pong decoding Combining forward and backward search
Combining forward and backward search Research Internship 09/ - /0/0 Mirko Hannemann Microsoft Research, Speech Technology (Redmond) Supervisor: Daniel Povey /0/0 Mirko Hannemann / Beam Search Search Errors
More informationDiscriminative Language Modeling with Conditional Random Fields and the Perceptron Algorithm
Discriminative Language Modeling with Conditional Random Fields and the Perceptron Algorithm Brian Roark Murat Saraclar AT&T Labs - Research {roark,murat}@research.att.com Michael Collins MIT CSAIL mcollins@csail.mit.edu
More informationK-Means and Gaussian Mixture Models
K-Means and Gaussian Mixture Models David Rosenberg New York University June 15, 2015 David Rosenberg (New York University) DS-GA 1003 June 15, 2015 1 / 43 K-Means Clustering Example: Old Faithful Geyser
More informationImproving the Performance of Text Categorization using N-gram Kernels
Improving the Performance of Text Categorization using N-gram Kernels Varsha K. V *., Santhosh Kumar C., Reghu Raj P. C. * * Department of Computer Science and Engineering Govt. Engineering College, Palakkad,
More informationFloating-point representation
Lecture 3-4: Floating-point representation and arithmetic Floating-point representation The notion of real numbers in mathematics is convenient for hand computations and formula manipulations. However,
More informationMEMMs (Log-Linear Tagging Models)
Chapter 8 MEMMs (Log-Linear Tagging Models) 8.1 Introduction In this chapter we return to the problem of tagging. We previously described hidden Markov models (HMMs) for tagging problems. This chapter
More informationLattice Rescoring for Speech Recognition Using Large Scale Distributed Language Models
Lattice Rescoring for Speech Recognition Using Large Scale Distributed Language Models ABSTRACT Euisok Chung Hyung-Bae Jeon Jeon-Gue Park and Yun-Keun Lee Speech Processing Research Team, ETRI, 138 Gajeongno,
More informationPlease note that some of the resources used in this assignment require a Stanford Network Account and therefore may not be accessible.
Please note that some of the resources used in this assignment require a Stanford Network Account and therefore may not be accessible. CS 224N / Ling 237 Programming Assignment 1: Language Modeling Due
More informationTreba: Efficient Numerically Stable EM for PFA
JMLR: Workshop and Conference Proceedings 21:249 253, 2012 The 11th ICGI Treba: Efficient Numerically Stable EM for PFA Mans Hulden Ikerbasque (Basque Science Foundation) mhulden@email.arizona.edu Abstract
More informationLearning N-gram Language Models from Uncertain Data
Learning N-gram Language Models from Uncertain Data Vitaly Kuznetsov 1,2, Hank Liao 2, Mehryar Mohri 1,2, Michael Riley 2, Brian Roark 2 1 Courant Institute, New York University 2 Google, Inc. vitalyk,hankliao,mohri,riley,roark}@google.com
More informationCOMBINING FEATURE SETS WITH SUPPORT VECTOR MACHINES: APPLICATION TO SPEAKER RECOGNITION
COMBINING FEATURE SETS WITH SUPPORT VECTOR MACHINES: APPLICATION TO SPEAKER RECOGNITION Andrew O. Hatch ;2, Andreas Stolcke ;3, and Barbara Peskin The International Computer Science Institute, Berkeley,
More informationSPARSE COMPONENT ANALYSIS FOR BLIND SOURCE SEPARATION WITH LESS SENSORS THAN SOURCES. Yuanqing Li, Andrzej Cichocki and Shun-ichi Amari
SPARSE COMPONENT ANALYSIS FOR BLIND SOURCE SEPARATION WITH LESS SENSORS THAN SOURCES Yuanqing Li, Andrzej Cichocki and Shun-ichi Amari Laboratory for Advanced Brain Signal Processing Laboratory for Mathematical
More informationWord Graphs for Statistical Machine Translation
Word Graphs for Statistical Machine Translation Richard Zens and Hermann Ney Chair of Computer Science VI RWTH Aachen University {zens,ney}@cs.rwth-aachen.de Abstract Word graphs have various applications
More informationFMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu
FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)
More informationAutomatic Speech Recognition (ASR)
Automatic Speech Recognition (ASR) February 2018 Reza Yazdani Aminabadi Universitat Politecnica de Catalunya (UPC) State-of-the-art State-of-the-art ASR system: DNN+HMM Speech (words) Sound Signal Graph
More informationApplications of Lexicographic Semirings to Problems in Speech and Language Processing
Applications of Lexicographic Semirings to Problems in Speech and Language Processing Richard Sproat Google, Inc. Izhak Shafran Oregon Health & Science University Mahsa Yarmohammadi Oregon Health & Science
More informationCS Lecture 2. The Front End. Lecture 2 Lexical Analysis
CS 1622 Lecture 2 Lexical Analysis CS 1622 Lecture 2 1 Lecture 2 Review of last lecture and finish up overview The first compiler phase: lexical analysis Reading: Chapter 2 in text (by 1/18) CS 1622 Lecture
More informationk-selection Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong
Department of Computer Science and Engineering Chinese University of Hong Kong In this lecture, we will put randomization to some real use, by using it to solve a non-trivial problem called k-selection
More informationNatural Language Processing with Deep Learning CS224N/Ling284
Natural Language Processing with Deep Learning CS224N/Ling284 Lecture 8: Recurrent Neural Networks Christopher Manning and Richard Socher Organization Extra project office hour today after lecture Overview
More informationLanguages and Compilers
Principles of Software Engineering and Operational Systems Languages and Compilers SDAGE: Level I 2012-13 3. Formal Languages, Grammars and Automata Dr Valery Adzhiev vadzhiev@bournemouth.ac.uk Office:
More informationNatural Language Processing
Natural Language Processing N-grams and minimal edit distance Pieter Wellens 2012-2013 These slides are based on the course materials from the ANLP course given at the School of Informatics, Edinburgh
More informationCS 2750 Machine Learning. Lecture 19. Clustering. CS 2750 Machine Learning. Clustering. Groups together similar instances in the data sample
Lecture 9 Clustering Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square Clustering Groups together similar instances in the data sample Basic clustering problem: distribute data into k different groups
More information10.4 Linear interpolation method Newton s method
10.4 Linear interpolation method The next best thing one can do is the linear interpolation method, also known as the double false position method. This method works similarly to the bisection method by
More informationAutomatic Summarization
Automatic Summarization CS 769 Guest Lecture Andrew B. Goldberg goldberg@cs.wisc.edu Department of Computer Sciences University of Wisconsin, Madison February 22, 2008 Andrew B. Goldberg (CS Dept) Summarization
More informationAlgorithms for NLP. Language Modeling II. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley
Algorithms for NLP Language Modeling II Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley Announcements Should be able to really start project after today s lecture Get familiar with bit-twiddling
More informationCSCE 314 Programming Languages
CSCE 314 Programming Languages Syntactic Analysis Dr. Hyunyoung Lee 1 What Is a Programming Language? Language = syntax + semantics The syntax of a language is concerned with the form of a program: how
More informationMulti-Class Logistic Regression and Perceptron
Multi-Class Logistic Regression and Perceptron Instructor: Wei Xu Some slides adapted from Dan Jurfasky, Brendan O Connor and Marine Carpuat MultiClass Classification Q: what if we have more than 2 categories?
More informationScanning methods and language modeling for binary switch typing
Scanning methods and language modeling for binary switch typing Brian Roark, Jacques de Villiers, Christopher Gibbons and Melanie Fried-Oken Center for Spoken Language Understanding Child Development &
More informationINF5820/INF9820 LANGUAGE TECHNOLOGICAL APPLICATIONS. Jan Tore Lønning, Lecture 8, 12 Oct
1 INF5820/INF9820 LANGUAGE TECHNOLOGICAL APPLICATIONS Jan Tore Lønning, Lecture 8, 12 Oct. 2016 jtl@ifi.uio.no Today 2 Preparing bitext Parameter tuning Reranking Some linguistic issues STMT so far 3 We
More informationUnstructured Data. CS102 Winter 2019
Winter 2019 Big Data Tools and Techniques Basic Data Manipulation and Analysis Performing well-defined computations or asking well-defined questions ( queries ) Data Mining Looking for patterns in data
More informationCSE 547: Machine Learning for Big Data Spring Problem Set 2. Please read the homework submission policies.
CSE 547: Machine Learning for Big Data Spring 2019 Problem Set 2 Please read the homework submission policies. 1 Principal Component Analysis and Reconstruction (25 points) Let s do PCA and reconstruct
More informationStructured Perceptron. Ye Qiu, Xinghui Lu, Yue Lu, Ruofei Shen
Structured Perceptron Ye Qiu, Xinghui Lu, Yue Lu, Ruofei Shen 1 Outline 1. 2. 3. 4. Brief review of perceptron Structured Perceptron Discriminative Training Methods for Hidden Markov Models: Theory and
More informationStatistical Methods for NLP
Statistical Methods for NLP Information Extraction, Hidden Markov Models Sameer Maskey * Most of the slides provided by Bhuvana Ramabhadran, Stanley Chen, Michael Picheny Speech Recognition Lecture 4:
More information/665 Natural Language Processing Assignment 6: Tagging with a Hidden Markov Model
601.465/665 Natural Language Processing Assignment 6: Tagging with a Hidden Markov Model Prof. Jason Eisner Fall 2017 Due date: Sunday 19 November, 9 pm In this assignment, you will build a Hidden Markov
More informationCS294-1 Final Project. Algorithms Comparison
CS294-1 Final Project Algorithms Comparison Deep Learning Neural Network AdaBoost Random Forest Prepared By: Shuang Bi (24094630) Wenchang Zhang (24094623) 2013-05-15 1 INTRODUCTION In this project, we
More informationConditional Random Fields and beyond D A N I E L K H A S H A B I C S U I U C,
Conditional Random Fields and beyond D A N I E L K H A S H A B I C S 5 4 6 U I U C, 2 0 1 3 Outline Modeling Inference Training Applications Outline Modeling Problem definition Discriminative vs. Generative
More informationDeep Learning for Visual Computing Prof. Debdoot Sheet Department of Electrical Engineering Indian Institute of Technology, Kharagpur
Deep Learning for Visual Computing Prof. Debdoot Sheet Department of Electrical Engineering Indian Institute of Technology, Kharagpur Lecture - 05 Classification with Perceptron Model So, welcome to today
More informationLecture 21 : A Hybrid: Deep Learning and Graphical Models
10-708: Probabilistic Graphical Models, Spring 2018 Lecture 21 : A Hybrid: Deep Learning and Graphical Models Lecturer: Kayhan Batmanghelich Scribes: Paul Liang, Anirudha Rayasam 1 Introduction and Motivation
More informationBackoff Inspired Features for Maximum Entropy Language Models
INTERSPEECH 2014 Backoff Inspired Features for Maximum Entropy Language Models Fadi Biadsy, Keith Hall, Pedro Moreno and Brian Roark Google, Inc. {biadsy,kbhall,pedro,roark}@google.com Abstract Maximum
More informationJeffrey D. Ullman Stanford University/Infolab
Jeffrey D. Ullman Stanford University/Infolab 3 Why Care? 1. Density of triangles measures maturity of a community. As communities age, their members tend to connect. 2. The algorithm is actually an example
More informationCS 288: Statistical NLP Assignment 1: Language Modeling
CS 288: Statistical NLP Assignment 1: Language Modeling Due September 12, 2014 Collaboration Policy You are allowed to discuss the assignment with other students and collaborate on developing algorithms
More informationComputer Science February Homework Assignment #2 Due: Friday, 9 March 2018 at 19h00 (7 PM),
Computer Science 401 13 February 2018 St. George Campus University of Toronto Homework Assignment #2 Due: Friday, 9 March 2018 at 19h00 (7 PM), Statistical Machine Translation TA: Mohamed Abdalla (mohamed.abdalla@mail.utoronto.ca);
More information